Monitoring camera, part association method and program

ABSTRACT

A monitoring camera includes a capturing unit that is configured to capture an image of at least one object within the angle of view, and a processor that is equipped with the artificial intelligence and that is configured to detect a plurality of characteristic parts of the object reflected in a captured image input from the capturing unit based on the artificial intelligence. The processor associates, for each of the at least one object, information for specifying each of the plurality of detected characteristic parts with a same object ID.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is based on Japanese Patent Application Nos.2020-154092 and 2020-154093 both filed on Sep. 14, 2020, the contents ofwhich are incorporated herein by reference.

BACKGROUND OF THE INVENTION 1. Technical Field

The present disclosure relates to a monitoring camera, a partassociation method and a program.

2. Background Art

JP-A-2017-25621 discloses an entry/exit management system which includesan authentication terminal that reads identification information and aface image of an authentication target from a recording medium, anauthentication device that performs authentication, and a door controldevice that controls the opening and closing of the door based on theverification result of the authentication device. In a predeterminedperiod including a time when the authentication terminal reads theidentification information, the authentication device detects a face ofa person from the video data captured by the monitoring camera and cutsout an image in the vicinity of the face, and calculates the degree ofmatch between the face image of the authentication target and theextracted face image. In addition, when the identification informationmatches the permission information and the degree of match is equal toor greater than a predetermined threshold value, the authenticationdevice performs control to open the door by driving the door controldevice.

SUMMARY OF INVENTION

In the JP-A-2017-25621, it is assumed that whether or notopening/closing of the door is permitted is determined, and thus animage used for this determination is only a face image of a personreflected in the video data captured by the monitoring camera. On theother hand, there is a need to search for an object (for example, aperson) reflected in video data captured by a large number of monitoringcameras installed outdoors such as in the city or indoors such as in afacility with high accuracy. In order to support such a search, it isconceivable to associate and save the object (for example, a person)reflected in the video data captured by the monitoring cameras and theface images thereof. However, as in the JP-A-2017-25621, preparing onlya face image in preparation for a search for an object (for example, aperson) is not sufficient to realize a highly accurate search.

The present disclosure has been made in view of the above-describedcircumstances, and an object of the present disclosure is to provide amonitoring camera, a part association method, and a program forsupporting improvement of search accuracy of one or more objectsreflected in video data within an angle of view.

The present disclosure provides a monitoring camera including acapturing unit that captures an image of at least one object in an angleof view; and a processor that is equipped with artificial intelligenceand that detects a plurality of characteristic parts of the objectreflected in a captured image input from the capturing unit based on theartificial intelligence, wherein the processor associates, for each ofthe at least one object, information for specifying each of theplurality of detected characteristic parts with a same object ID.

In addition, the present disclosure provides a part association methodperformed by a monitoring camera that is equipped with artificialintelligence, the part association method including capturing an imageof at least one object in an angle of view; detecting a plurality ofcharacteristic parts of the object reflected in an input captured imagebased on the artificial intelligence; and associating, for each of theat least one object, information for specifying each of the plurality ofdetected characteristic parts with a same object ID.

The present disclosure provides a monitoring camera including acapturing unit that captures an image of at least one object in an angleof view; and a processor that is equipped with artificial intelligenceand that detects a characteristic part of the object reflected in acaptured image input from the imaging unit based on the artificialintelligence, wherein the processor determines whether or not adetection part that is the part detected based on the artificialintelligence is a priority part suitable for tracking processing of theobject, and uses the priority part as a tracking frame to perform thetracking processing of the object when it is determined that thedetection part is the priority part.

The present disclosure provides a tracking frame generation methodperformed by a monitoring camera equipped with artificial intelligence,the tracking frame generation method comprising capturing an image of atleast one object in an angle of view; detecting a characteristic part ofthe object reflected in an input captured image based on the artificialintelligence; determining whether or not a detection part that is thepart detected based on the artificial intelligence is a priority partsuitable for tracking processing of the object; and using the prioritypart as a tracking frame to perform the tracking processing of theobject when it is determined that the detection part is the prioritypart.

According to the present disclosure, it is possible to support theimprovement of search accuracy of one or more objects reflected in thevideo data within the angle of view.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a system configuration example of a monitoring camerasystem according to Embodiment 1.

FIG. 2 shows an example of a type of an image associated with the sameobject ID for a person as an object reflected in the data of a capturedimage.

FIG. 3 is an attribute information table showing an example of arelationship between an image type and an attribute identified by amonitoring camera.

FIG. 4 is a flowchart showing an example of an operation procedure ofassociation processing by the monitoring camera according to Embodiment1.

FIG. 5 is an explanatory diagram of an example of associating a personand a bicycle as an object reflected in data of a captured image.

FIG. 6 is a flowchart showing an example of an operation procedure ofidentification processing for each part by the monitoring cameraaccording to Embodiment 1.

FIG. 7 is an explanatory diagram of a generation example of a trackingframe.

FIG. 8 is a flowchart illustrating a detailed operation procedureexample of step St11 in FIG. 6.

DETAILED DESCRIPTION OF THE EXEMPLARY EMBODIMENT Background of PresentDisclosure

In JP-A-2017-25621, it is assumed that whether or not opening/closing ofthe door is permitted is determined, and thus an image used for thisdetermination is only a face image of a person reflected in the videodata captured by the monitoring camera. On the other hand, there is aneed to search for an object (for example, a person) reflected in videodata captured by a large number of monitoring cameras installed outdoorssuch as in the city or indoors such as in a facility with high accuracy.In order to support such a search, it is conceivable to associate andsave the object (for example, a person) reflected in the video datacaptured by the monitoring cameras and the face images thereof. However,as in JP-A-2017-25621, preparing only a face image in preparation for asearch for an object (for example, a person) is not sufficient torealize a highly accurate search.

Therefore, in the following Embodiment 1, an example of a monitoringcamera, a part association method, and a program for supportingimprovement of search accuracy of one or more objects reflected in videodata within an angle of view will be described.

On one hand, in JP-A-2017-25621, it is assumed that whether or notopening/closing of the door is permitted is determined, and thus animage used for this determination is only a face image of a personreflected in the video data captured by the monitoring camera. On theother hand, there is a need to track a row (so-called moving line) of anobject (for example, a person to be monitored) reflected in video datacaptured by a large number of monitoring cameras installed outdoors suchas in the city or indoors such as in a facility with high accuracy. Inorder to support such a track, it is conceivable to associate and savethe object (for example, a person) reflected in the video data capturedby the monitoring cameras and the face images thereof. However, as inJP-A-2017-25621, only using a face image for tracking an object (forexample, a person to be monitored) is not sufficient to realize a highlyaccurate tracking.

Therefore, an object of the following Embodiment 1 is to provide amonitoring camera, a tracking frame generation method, and a program forsupporting improvement of tracking accuracy of an object reflected invideo data within an angle of view.

Hereinafter, embodiments of a monitoring camera, a part associationmethod and a program according to the present disclosure will bedescribed in detail with reference to the drawings as appropriate.However, unnecessarily detailed description may be omitted. For example,detailed description of a well-known matter or repeated description ofsubstantially the same configuration may be omitted. This is to avoidunnecessary redundancy in the following description and to facilitateunderstanding for those skilled in the art. Incidentally, theaccompanying drawings and the following description are provided forthose skilled in the art to fully understand the present disclosure, andare not intended to limit the subject matter described in the claims.

FIG. 1 shows a system configuration example of a monitoring camerasystem 100 according to Embodiment 1. The monitoring camera system 100includes at least one monitoring camera 10 and a client server 20. Theclient server 20 and each monitoring camera 10 are communicablyconnected to each other via a network NW1. The network NW1 may be awired network (for example, a wired local area network (LAN)), awireless network (for example, a wireless LAN such as Wi-Fi (registeredtrademark), a wireless wide area network (WAN), a fourth generationmobile communication system (4G), or a fifth generation mobilecommunication system (5G)). Incidentally, the configuration of themonitoring camera included in the monitoring camera system 100 may bethe same as that of the monitoring camera 10, or may include amonitoring camera having a configuration different from theconfiguration of the monitoring camera 10 other than the monitoringcamera 10.

The monitoring camera 10 is a computer including an artificialintelligence (AI), and captures an image of a monitoring area (forexample, indoor or outdoor) designated by an operator of the monitoringcamera system 100. The monitoring camera 10 acquires data of thecaptured image by capturing an image of a monitoring area, and detects acharacteristic part of an object (for example, a person) reflected inthe data of the captured image based on AI. In the followingdescription, a person is mainly illustrated as an object, but the objectis not limited to a person, and may be, for example, a vehicle such as avehicle or a bicycle, or may be a person and a vehicle (see FIG. 5).

The monitoring camera 10 includes an imaging unit 11, a memory 12, aprocessor 13, a reception unit 14, and a transmission unit 15. Each ofthe imaging unit 11, the memory 12, the reception unit 14, and thetransmission unit 15 is connected to the processor 13 via an internalbus (not shown) so that data can be input or output.

The imaging unit 11 includes at least a lens (not shown) as an opticalelement and an image sensor (not shown). The lens receives lightreflected by an object (an example of a subject) in the monitoring area,and forms an optical image of the subject on a light receiving surface(in other words, an imaging surface) of the image sensor. The imagesensor is, for example, a solid-state imaging sensor such as a chargedcoupled device (CCD) or a complementary metal oxide semiconductor(CMOS). The image sensor converts the optical image formed on theimaging surface through the lens into an electrical signal for eachpredetermined time (for example, 1/30 (seconds)), and transmits theelectrical signal to the processor 13. For example, when thepredetermined time is 1/30 (seconds), the frame rate of the monitoringcamera 10 is 30 fps. In addition, the imaging unit 11 may generate thedata of the captured image by performing predetermined signal processingon the electrical signal at each predetermined time described above.Incidentally, the processing of generating the data of the capturedimage may be executed by the processor 13. The imaging unit 11 outputsthe data of the captured image to the processor 13.

The memory 12 is configured using, for example, a random access memory(RAM) and a read only memory (ROM), and temporarily holds a programnecessary for executing the operation of the monitoring camera 10, andfurther, data generated during the operation. The RAM is, for example, awork memory used during the operation of the monitoring camera 10. TheROM stores and holds, for example, a program according to the presentdisclosure for controlling the monitoring camera 10 in advance. In otherwords, the processor 13 can execute various types of processing (steps)related to the part association method according to the presentdisclosure to the monitoring camera 10, which is a computer, byexecuting a program stored in the ROM. For example, the memory 12temporarily stores data of a captured image captured by the imaging unit11 and data (described later) to be transmitted to the client server 20.Further, the memory 12 may further include a flash memory in addition tothe RAM and the ROM, and may store data of the captured image ortransmission data (described later) to be transmitted to the clientserver 20. In addition, the memory 12 stores data of a learning modelfor AI processing (described later) used by an AI processing unit 131(described later).

The processor 13 is configured using, for example, a central processingunit (CPU), a digital signal processor (DSP), a graphical processingunit (GPU), or a field-programmable gate array (FPGA). The processor 13functions as a controller that controls the overall operation of themonitoring camera 10, and performs control processing for controllingthe operation of each part of the monitoring camera 10, input/outputprocessing of date with each part of the monitoring camera 10,arithmetic processing of data, and storage processing of data. Theprocessor 13 operates according to a program stored in the memory 12. Inaddition, the processor 13 uses the memory 12 at the time of operation,and temporarily saves the data generated or acquired by the processor 13in the memory 12. The processor 13 includes an AI processing unit 131and a detection area/threshold setting unit 132.

The AI processing unit 131 uses the data of the learning model for AIprocessing read from the memory 12 so as to be executable by the AIprocessing unit 131 (in other words, based on AI), and executes varioustypes of processing on the data of the captured image input (in otherwords, input) from the imaging unit 11 under the parameter (describedlater) set by the detection area/threshold setting unit 132. Here, thedata of the learning model for AI processing includes, for example, aprogram that defines the contents of various types of processingexecuted by the AI processing unit 131, parameters necessary for varioustypes of processing, and teacher data. Specifically, the AI processingunit 131 includes a site detection association unit 1311, a trackingunit 1312, a best shot determination unit 1313, and a siteidentification unit 1314.

Here, the learning processing for generating the data of the learningmodel for AI processing may be performed using one or more statisticalclassification techniques. Examples of the statistical classificationtechnique include, for example, linear classifiers, support vectormachines, quadratic classifiers, kernel density estimation (KernelEstimation), decision trees, artificial neural networks, a bayesiantechniques and/or networks, hidden markov models, binary classifiers,multi-class classifiers, a clustering technique, a random foresttechnique, a logistic regression technique, a linear regressiontechnique, a gradient boosting technique, and the like. However, thestatistical classification technique used is not limited thereto. Inaddition, the generation of the data of the learning model may beperformed by the AI processing unit 131 in the monitoring camera 10, ormay be performed by the client server 20, for example

The site detection association unit 1311 detects a plurality ofcharacteristic parts (sites) of the object, which are reflected in thedata of the captured image input from the imaging unit 11, based on theAI. The site detection association unit 1311 associates (links) aplurality of parts corresponding to the same object detected with anobject ID (ID: identification) serving as identification information ofthe object (refer to FIG. 2).

FIG. 2 is a diagram illustrating an example of a type of an imageassociated with the same object ID for a person PS1 as an objectreflected in the data of a captured image IMG1. The captured image IMG1illustrated in FIG. 2 indicates a state in which a plurality of personsare crossing the pedestrian crossing. The site detection associationunit 1311 detects each individual object (for example, a person)reflected in the captured image IMG1, and detects the characteristicparts of each person (for example, the person PS1) and associates thecharacteristic parts with the identification information of the sameperson.

Here, the characteristic parts of the object to be detected are partsindicating the physical feature of the person in order to improve thesearch accuracy of the person PS1 by the client server 20, and are, forexample, the whole body of the person PS1, the upper part of theshoulder of the person PS1, and the face of the person PS1. That is,accompanying with the detection of the person PS1, the site detectionassociation part 1311 generates information (for example, coordinatesindicating a position in the captured image or a cut-out image of eachportion) for specifying the whole body frame portion WK1, the scapulaupper frame portion WK2, and the face frame portion WK3 from the data ofthe captured image IMG1. Further, the site detection association unit1311 uses the object ID (for example, A001) that is the identificationinformation of the detected person PS1 to assigns the same object ID(for example, A001) to the information (see above) for specifying thewhole body frame portion WK1, the scapula upper frame portion WK2, andthe face frame portion WK3 and associates the information. Accordingly,the site detection association unit 1311 can associate three differentparts of the whole body frame portion WK1, the scapula upper frameportion WK2, and the face frame portion WK3 with the same person PS1 ascharacteristic parts for searching for the same person (for example, theperson PS1) reflected in the data of the captured image IMG1, and thusit is possible to improve the search accuracy of the person PS1 by theclient server 20.

The tracking unit 1312 generates a tracking frame for the trackingprocessing described later using the detection result by the sitedetection association unit 1311 and the result of the associationprocessing (see FIG. 7). A method of generating the tracking frame willbe described later with reference to FIG. 7. In addition, the trackingunit 1312 performs the tracking processing for tracking the row(so-called moving line) of the object reflected in the data of thecaptured image input from the imaging unit 11, using the detectionresult by the site detection associating unit 1311 and the result of theassociation processing (for example, the coordinate information of eachof the plurality of parts associated with the object ID of the objectreflected in the data of the captured image).

The best shot determination unit 1313 inputs the detection result andthe result of the association processing by the site detectionassociation unit 1311, and the result of the tracking processing of theobject by the tracking unit 1312. The best shot determination unit 1313determines whether or not the part detected by the site detectionassociation unit 1311 is the best shot having an image quality suitablefor the identification processing of the attribute information, based onthe detection result and the association result by the site detectionassociation unit 1311, and the result of the tracking processing of theobject by the tracking unit 1312.

Here, whether or not the part detected by the site detection associationunit 1311 is the best shot can be determined as follows. For example,when at least one of the whole body of the person PS1, the scapula upperportion of the person PS1, and the face of the person PS1 is detected bythe site detection association unit 1311 in the frame, the best shotdetermination unit 1313 determines that the detected part is the bestshot. On the other hand, when any one of the whole body of the personPS1, the scapula upper portion of the person PS1, and the face of theperson PS1 is not detected by the site detection association unit 1311in the frame, the best shot determination unit 1313 determines thatthere is no part of the best shot. In addition, when the whole body ofthe person PS1, the scapula upper portion of the person PS1, and theface of the person PS1 are detected in the frame and the detectionposition (that is, the coordinates in the captured image) is in thevicinity of the center (in other words, not in the vicinity of the edgeof the captured image), the best shot determination unit 1313 maydetermine that each part is the best shot.

When the site identification unit 1314 receives the determination resultfrom the best shot determination unit 1313 that the part detected by thesite detection association unit 1311 is the best shot, the siteidentification unit 1314 cuts out an image (see FIG. 2) of thecharacteristic part of the object determined to be the best shot fromthe frame input from the imaging unit 11 based on the object ID, andidentifies the attribute information (see FIG. 3) for each image of thecut-out part based on AI (for example, deep learning). That is, the siteidentification unit 1314 identifies the attribute information (see FIG.3) of the cut-out image (see FIG. 2) of the characteristic part of theobject cut out by the site detection association unit 1311 based on AI(for example, deep learning) (for example, analyzes what kind of contentthe attribute information has).

Here, the attribute information will be described with reference to FIG.3. FIG. 3 is an attribute information table showing an example of arelationship between an image type and an attribute identified by themonitoring camera 10. In FIG. 3, an object is a person, and a whole bodyframe image and a scapula upper frame image are illustrated as cut-outimages (see FIG. 2) of a part to be subjected to the identificationprocessing by the site identification unit 1314.

The site identification unit 1314 identifies and extracts characteristicelements (for example, the color of clothes, the type of clothes,presence/absence of bag, and presence/absence of muffler) reflected inthe whole body frame image of the object as attribute information.Incidentally, the type of clothing indicates the length of the sleeve ofthe clothing to which the object (for example, a person) is attached.The attribute information, which is a characteristic element, is asearch item that can be used as a search condition (that is, acharacteristic element of a person obtained from an image showing thewhole body of a person) at the time of searching for a person by theclient server 20. Accordingly, when such attribute information is inputas a search condition, the efficiency of search can be increased in thatthe load of the search processing of the client server 20 can bereduced.

In addition, the site identification unit 1314 identifies and extractscharacteristic elements (for example, hairstyle, hair color, beard,presence/absence of mask, presence/absence of glasses, age, gender)reflected in the scapula upper frame image of the object as attributeinformation. The attribute information, which is a characteristicelement, is a search item that can be used as a search condition (thatis, a characteristic element of a person obtained from an image showingthe scapula upper portion of a person) at the time of searching for aperson by the client server 20. Accordingly, when such attributeinformation is input as a search condition, the efficiency of search canbe increased in that the load of the search processing of the clientserver 20 can be reduced.

The detection area/threshold value setting unit 132 acquires settingdata of a masking area (that is, an area to be excluded from detectionof an object) transmitted from the client server 20 via the receptionunit 14, and sets the setting data in the AI processing unit 131. Thesetting data is a parameter used at the time of the AI processing by theAI processing unit 131. For example, when the setting data of themasking area is set in the AI processing unit 131, the site detectionassociation unit 1311 detects an area excluding the masking area fromthe monitoring area within the angle of view of the monitoring camera 10as an area for detecting the object.

In addition, the detection area/threshold value setting unit 132acquires setting data of a threshold value for detection transmittedfrom the client server 20 via the reception unit 14 and sets the settingdata in the AI processing unit 131. The setting data is a parameter usedat the time of the AI processing by the AI processing unit 131. Forexample, when the setting data of the threshold value is set in the AIprocessing unit 131, the site detection association unit 1311 outputsthe detection result when the score (in other words, the probabilityindicating the detection accuracy) obtained as the AI processing resultexceeds the setting data of the threshold value.

The reception unit 14 is configured using a communication circuit forreceiving data from the network NW1, and receives, for example, datatransmitted from the client server 20 via the network NW1. For example,the reception unit 14 receives the data of the detection areatransmitted from the client server 20 or the data of the threshold valuefor detection of the part of the object using the AI, and outputs thedata to the processor 13.

The transmission unit 15 is configured using a communication circuit fortransmitting data to the network NW1, and transmits, for example, datagenerated by the processor 13 via the network NW1. For example, thetransmission unit 15 transmits the transmission data generated by theprocessor 13 (for example, the identification result of the attributeinformation for each part of the object and the information related tothe best shot used for the identification processing) to the clientserver 20 via the network NW1.

The client server 20 is a computer used by a user who is a user of themonitoring camera system 100 operated by the operator, and transmits andreceives data to and from the monitoring camera 10 via the network NW1.The client server 20 can transmit setting data (see above), which is anexample of a parameter of the monitoring camera 10, to the monitoringcamera 10 via the network NW1 and set the setting data. The setting datais, for example, setting data of a masking area or setting data of athreshold value for detection of an object by AI of the monitoringcamera 10. In addition, the client server 20 can extract or generate athumbnail of image data or image data satisfying the search condition byreferring to the storage unit 26 based on the search condition (forexample, the attribute information illustrated in FIG. 3) input by theoperation of the user, and display the thumbnail of the image data orthe image data on the display unit 27.

The client server 20 includes an input unit 21, a memory 22, a processor23, a reception unit 24, a transmission unit 25, a storage unit 26, anda display unit 27. Each of the input unit 21, the memory 22, thereception unit 24, the transmission unit 25, the storage unit 26, andthe display unit 27 is connected to the processor 23 such that data canbe input or output via an internal bus (not shown). Incidentally, whenthe configuration of the computer including the input unit 21, thememory 22, the processor 23, the reception unit 24, the transmissionunit 25, the storage unit 26, and the display unit 27 is provided,instead of the client server 20, a personal computer (PC), a smartphone,or a tablet may be used.

The input unit 21 is a user interface that detects an input operation bythe user, and is configured using, for example, a mouse, a keyboard, atouch panel, or the like. The input unit 21 receives data of varioustypes of input items (for example, search conditions of an object)specified by an input operation of the user, and transmits the data tothe processor 23.

The memory 22 is configured using, for example, a RAM and a ROM, andtemporarily holds a program necessary for executing the operation of theclient server 20, and further, data generated during the operation. TheRAM is, for example, a work memory used during the operation of theclient server 20. The ROM stores and holds, for example, a program forcontrolling the client server 20 in advance. In other words, theprocessor 23 can execute various types of processing (steps) on theclient server 20, which is a computer, by executing a program stored inthe ROM. For example, the memory 12 stores a program for performingsearch processing of image data or a thumbnail of an object thatsatisfies the search condition input by the input unit 21.

The processor 23 is configured using, for example, a CPU, a DSP, a GPU,or an FPGA. The processor 23 functions as a controller that controls theentire operation of the client server 20, and performs controlprocessing for controlling the operation of each unit of the clientserver 20, input/output processing of data with each unit of the clientserver 20, arithmetic processing of data, and storage processing ofdata. The processor 23 operates according to the program stored in thememory 22. The processor 23 uses the memory 22 during operation, andtemporarily saves the data generated or acquired by the processor 23 inthe memory 22. The processor 23 includes a person search unit 231 and asearch output unit 232.

The person search unit 231 performs search processing of the image dataor the thumbnail of the object that satisfies the search condition inputby the input unit 21 by the operation of the user, and sends the resultof the search processing to the search output unit 232.

The search output unit 232 outputs the result of the search processingfrom the person search unit 231 to the display unit 27 and display theresult.

The reception unit 24 is configured using a communication circuit forreceiving data from the network NW1, and receives, for example,transmission data transmitted from the monitoring camera 10 (forexample, the identification result of the attribute information of eachpart of the object and information on the best shot used for theidentification processing) via the network NW1. For example, thereception unit 24 receives the transmission data transmitted from themonitoring camera 10 and outputs the transmission data to the processor23.

The transmission unit 25 is configured using a communication circuit fortransmitting data to the network NW1, and transmits, for example, datagenerated by the processor 23 via the network NW1. For example, thetransmission unit 25 transmits the data of the detection area generatedby the processor 23 or the data of the threshold value for detection ofthe part of the object using the AI to the monitoring camera 10 via thenetwork NW1.

The storage unit 26 is configured using, for example, a flash memory, ahard disk drive (HDD), or a solid state drive (SSD). The storage unit 26stores (saves) the transmission data transmitted from one or moremonitoring cameras 10 in association with the identification informationof the monitoring camera 10 of the transmission source.

The display unit 27 is configured using, for example, a display such asan liquid crystal display (LCD) or an organic electroluminescence (EL).The display unit 27 displays the data of the search result generated bythe processor 23 based on the operation of the operation.

Next, an example of an operation procedure of the association process ofthe monitoring camera 10 according to Embodiment 1 will be describedwith reference to FIG. 4. FIG. 4 is a flowchart showing an example of anoperation procedure of association processing by the monitoring camera10 according to Embodiment 1. The operation procedure illustrated inFIG. 4 is executed each time the data of the captured image is input tothe processor 13 from the imaging unit 11 by the site detectionassociation unit 1311 of the AI processing unit 131 of the processor 13of the monitoring camera 10.

In FIG. 4, the AI processing unit 131 detects a plurality ofcharacteristic parts (sites) of an object (for example, a person), whichare reflected in the data of the captured image input from the imagingunit 11, based on AI (for example, the site detection association unit1311) (St1). In step St1, as a detection result based on AI, forexample, as shown in FIG. 2, information (for example, coordinatesindicating the position in the captured image, or a cut-out image ofeach portion) for specifying the whole body frame portion WK1, thescapula upper frame portion WK2, and the face frame portion WK3 isgenerated.

The AI processing unit 131 assigns a new object ID (for example, “A001”illustrated in FIG. 2) to the information for specifying the main site(for example, the scapula upper portion) of the object detected in stepSt1 (for example, the coordinates indicating the position in thecaptured image of the main site or the cut-out image of the main site)(St2). Here, the reason why the main site (main part) in the case wherethe object is a person is the scapula upper portion is that the part ofthe person reflected in the data of the captured image is relativelystably detected, and the number of attribute information obtained by theidentification processing of the scapula upper frame image correspondingto the scapula upper portion is the largest (see FIG. 3), and thus theusability as the characteristic part of the person as the object ishigh.

In step St3 following step St2, the AI processing unit 131 associatesthe main site to which the object ID is assigned in step St2 with othercharacteristic parts (St3 to St5). Steps St3 to St5 are executed foreach main site of the object reflected in the data of the captured imageinput from the imaging unit 11. Incidentally, in the data of thecaptured image input from the imaging unit 11, when a plurality of partscannot be detected in step St1 from the same object and are single, theassociation processing corresponding to the object cannot be performed,and thus the processing of steps St3 to St5 for the object is omitted.

The AI processing unit 131 determines whether or not the associationprocessing is possible on the assumption that the main site of theobject detected in step St1 and the other part of the object detected instep St1 are detected from the same object (St4). For example, the AIprocessing unit 131 determines whether or not there is another site (forexample, a body frame portion or a face frame portion) detected from thesame object (for example, a person) having a main site (for example, ascapula upper frame portion) detected in Step St1 (St4). In a case wherethere is no other part detected from the same object having the mainsite (St4, NO), since the association processing related to the objectis not possible, the AI processing unit 131 executes the associationprocessing relating to the other object reflected in the data of thecaptured image currently being processed. That is, the AI processingunit 131 determines whether or not the association processing of themain site and the other part can be performed for each object detectedin step St11, and executes the association processing when it isdetermined that the association processing can be performed.

On the other hand, in a case where there is another part detected fromthe same object having the main site (St4. YES), the AI processing unit131 performs the association processing by assigning the same object IDas the object ID assigned to the information for specifying the mainsite to the information (for example, the coordinate indicating theposition in the captured image of the another part or the cut-out imageof the other part) for specifying the another part detected from thesame object having the main site (St5).

Incidentally, in a case where only a single part (for example, a wholebody frame, a scapula upper portion, or a face) cannot be detected inStep St1, the AI processing unit 131 assigns a new object ID toinformation (for example, a coordinate indicating a position in thecaptured image of the part or a cut-out image of another part) forspecifying the single part with respect to the object (St6).

In addition, in Embodiment 1, the site detection association unit 1311of the AI processing unit 131 may associate main sites of the pluralityof objects reflected in the data of the captured image with each other(see FIG. 5). FIG. 5 is an explanatory diagram of an example ofassociating a person and a bicycle as an object reflected in data of acaptured image IMG2. That is, in the example of FIG. 5, a pair of oneperson and one bicycle is a plurality of objects, and three pairs areshown.

Specifically, as illustrated in FIG. 5, three people and three bicyclesare reflected in the data of the captured image IMG2. The bicycles usedby each person are located close to each of the three people. When suchdata of the captured image IMG2 is input, the AI processing unit 131detects each of the plurality of objects reflected in the data of thecaptured image IMG2 based on the AI, and specifies the part for each ofthe detected objects. For example, the AI processing unit 131 specifiesthe whole body frame portions WK11, WK12, WK13 of the person and thewhole frame portions WK14, WK15, WK16 of the bicycle.

For example, based on the fact that the distance between the whole bodyframe portion WK11 of the person and the whole frame portion WK14 of thebicycle, the distance between the whole body frame portion WK12 of theperson and the whole frame portion WK15 of the bicycle, and the distancebetween the whole body frame portion WK13 of the person and the wholeframe portion WK16 of the bicycle are close to each other, the AIprocessing unit 131 performs the association processing on each of thewhole body frame portion WK11 of the person, the whole frame portionWK14 of the bicycle, the whole body frame portion WK12 of the person,the whole frame portion WK15 of the bicycle, the whole body frameportion WK13 of the person, and the whole frame portion WK16 of thebicycle. For example, the AI processing unit 131 assigns the object ID“B001” to the whole body frame portion WK1 of the person and the wholeframe portion WK14 of the bicycle, assigns the object ID “B002” to thewhole body frame portion WK12 of the person and the whole frame portionWK15 of the bicycle, and assigns the object ID “B003” to the whole bodyframe portion WK13 of the person and the whole frame portion WK16 of thebicycle. Accordingly, the monitoring camera 10 can associate a pair of aplurality of objects having different types (for example, a person and abicycle) reflected in the data of the captured image IMG2 with the sameobject ID.

Next, an example of an operation procedure of identification processingof the main site or the other part of the object of the monitoringcamera 10 according to Embodiment 1 will be described with reference toFIG. 6. FIG. 6 is a flowchart showing an example of an operationprocedure of identification processing for each site by the monitoringcamera 10 according to Embodiment 1. The operation procedure illustratedin FIG. 6 is executed each time the data of the captured image is inputto the processor 13 from the imaging unit 11 by each of the trackingunit 1312, the best shot determination unit 1313, and the siteidentification unit 1314 of the AI processing unit 131 of the processor13 of the monitoring camera 10 after the operation procedure illustratedin FIG. 4. Therefore, before the operation procedure of FIG. 6 isstarted, the association processing result relating to the objectdescribed with reference to FIG. 4 is obtained.

In FIG. 6, the AI processing unit 131 performs, in the tracking unit1312, the tracking processing (tracking) for tracking the row (so-calledmoving line) of the object using the association processing result (seeFIG. 4) relating to one or more objects (for example, a person)reflected in the data of the captured image input from the imaging unit11 (St11). The details of the operation in step St11 will be describedin detail with reference to FIGS. 7 and 8.

The AI processing unit 131 determines whether or not the main site orthe other part detected by the site detection association unit 1311 isthe best shot suitable for the identification processing in the bestshot determination unit 1313 (St12). Incidentally, since the method ofdetermining whether or not the main site or the other part is the bestshot is as described above, the description thereof is omitted here.When it is determined that the main site or the other part is not thebest shot (St13, NO), since the data (frame) of the captured image isnot suitable for the identification processing of the main site or theother part of the object, the operation procedure of FIG. 6 related tothe data (frame) of the captured image ends.

When it is determined that the main site or the other part is the bestshot (St13, YES), the AI processing part 131 executes the identificationprocessing for each main site or the other part (St14 to St16).Specifically, the AI processing unit 131 cuts out an image of acorresponding site from the data of the captured image for each mainsite or the other part (in other words, part) determined to be the bestshot (St15). That is, the AI processing unit 131 generates a cut-outimage of the main site or the other part (St15).

The AI processing unit 131 executes the identification processing of thecut-out image based on AI (for example, deep learning) using the cut-outimage of the main site or the other part generated in step St15 and theattribute information table (for example, a table that defines therelationship between the type of the cut-out image and the attributeinformation corresponding to the type, see FIG. 3) (St16). For example,when the type of the cut-out image is the scapula upper frame image, theAI processing unit 131 extracts the contents of the attributeinformation (for example, hairstyle, hair color, beard, presence/absenceof mask, presence/absence of glasses, age, gender) corresponding to thescapula upper frame image. For example, in a case where the type of thecut-out image is the whole body frame image, the AI processing unit 131extracts the contents of the attribute information (for example, colorof clothes, type of clothes, presence/absence of bag, andpresence/absence of muffler) corresponding to the whole body frameimage.

The processor 13 acquires the identification result data of steps St14to St16 executed by the AI processing unit 131, and generatestransmission data including the identification result data and the bestshot information indicating the information of the best shot used forthe identification processing. The best shot information includes, forexample, at least date and time information on which a captured imageshowing a part determined to be the best shot is captured, an ID of thebest shot, and position information such as coordinates indicating aposition in the captured image of the part determined to be the bestshot. The identification result data includes, for example, at least aresult of the identification processing by the AI processing unit 131(for example, data indicating the content of each attribute informationand a score indicating the identification processing accuracy of AI).The processor 13 transmits the generated transmission data to the clientserver 20 via the transmission unit 15 and the network NW1 (St17).

Next, an operation procedure example of the tracking processing fortracking the row (so-called moving line) of the object by the monitoringcamera 10 will be described with reference to FIGS. 7 and 8. FIG. 7 isan explanatory diagram of a generation example of the tracking frame.FIG. 8 is a flowchart illustrating a detailed operation procedureexample of step St11 in FIG. 6. The operation procedure illustrated inFIG. 8 is executed each time the data of the captured image is inputfrom the imaging unit 11 to the processor 13 mainly by the tracking unit1312 of the AI processing unit 131 of the processor 13 of the monitoringcamera 10 after the operation procedure illustrated in FIG. 4.Therefore, before the operation procedure of FIG. 8 is started, theassociation processing result relating to the object described withreference to FIG. 4 is obtained.

In FIG. 8, the AI processing unit 131 generates tracking frames C1 to C4(refer to FIG. 7) for tracking processing for each object using thedetection result by the site detection association unit 1311 and theassociation processing result (St21 to St25).

Here, an example of a method of generating a tracking frame by thetracking unit 1312 will be described with reference to FIG. 7. Themethod of generating a tracking frame is different in four ways: (1) acase where a scapula upper portion of an object (for example, a person)is detected, (2) a case where only a face of an object (for example, aperson) is detected, (3) a case where only a whole body of an object(for example, a person) is detected, and (4) a case where only a wholebody and a face of an object (for example, a person) are detected. Inthe description of FIG. 7, the main site of the object (for example, aperson) is the scapula upper portion as in the description of FIG. 4.

(Method 1) in a Case where a Scapula Upper Portion of an Object (forExample, a Person) is Detected

In a case where the scapula upper frame B1 indicating the scapula upperportion is detected by the site detection association unit 1311, forexample, the tracking unit 1312 employs the same region as that of thescapula upper frame B1 as the tracking frame C1 to generate data of thetracking frame C1 (an example of the tracking frame information). Thedata of the tracking frame C1 is used for the tracking processing ofstep St26, which will be described later.

(Method 2) in a Case Only the Face of the Object (for Example, a Person)is Detected

In a case where only the face frame B2 indicating the face is detectedby the site detection association unit 1311, for example, the trackingunit 1312 employs a region obtained by enlarging the face frame B2 bytwo times as the tracking frame C2 to generate data of the trackingframe C2 (an example of tracking frame information). The data of thetracking frame C2 is used for the tracking processing of step St26,which will be described later.

(Method 3) in a Case where Only the Whole Body of the Object (forExample, a Person) is Detected

In a case where only the whole body frame B3 indicating the whole bodyis detected by the site detection association unit 1311, for example,the tracking unit 1312 employs a region obtained by reducing the widthof the whole body frame B3 to 0.6 times with respect to the width,increasing the width of the whole body frame B3 by 1.07 times withrespect to the height, matching the center position of the X direction(see FIG. 7) indicating the horizontal direction with the coordinates inthe X direction of the whole body frame B3, and matching the centerposition of the Y direction (see FIG. 7) indicating the verticaldirection with the coordinates moved in the Y direction by 0.2 times theheight (Y direction) of the whole body frame B3 from the position of theupper end of the whole body frame B3 as the tracking frame C3, andgenerates data of the tracking frame C3 (an example of the trackingframe information). The data of the tracking frame C3 is used for thetracking processing of step St26, which will be described later.

(Method 4) in a Case where Only the Whole Body and the Face of theObject (for Example, a Person) is Detected

In a case where only the whole body frame B3 and the face frame B2 aredetected by the site detection association unit 1311, the tracking unit1312 employs a region obtained by averaging the region of the trackingframe C2 based on the detection of only the face and the region of thetracking frame C3 based on the detection of only the whole body as thetracking frame C4, and generates the data of the tracking frame C4 (anexample of the tracking frame information). The data of the trackingframe C4 is used for the tracking processing of step St26, which will bedescribed later.

The AI processing unit 131 determines whether or not a main site (forexample, a scapula upper portion when the object is a person) that isrelatively stable and easily detected in the object is detected based onthe result of the association processing relating to the objectdescribed with reference to FIG. 4 (St22). When it is determined thatthe main site of the object has been detected (St22, YES), the AIprocessing unit 131 generates a tracking frame (for example, thetracking frame C1) from the coordinates (in other words, the position)of the main site in the data of the captured image input from theimaging unit 11 with reference to (Method 1) described above (St23).

On the other hand, when it is determined that the main site of theobject has not been detected (St22, NO), the AI processing unit 131estimates the position (coordinates) of the main site (for example, thescapula upper portion) from the other site (for example, the whole bodyor the face) other than the main site of the object with reference toone of the above-described (Method 2) to (Method 4) (St24). Further, theAI processing unit 131 generates a tracking frame from the position(coordinates) of the main site estimated in step St24 with reference toone of (Method 2) to (Method 4) described above (St25).

The AI processing unit 131 uses the tracking frames generated in stepsSt22 to St25 to execute tracking processing (tracking) for tracking therow (so-called moving line) of the object reflected in the data of thecaptured image in the tracking unit 1312 (St26). That is, since thetracking frame is generated in order to stably detect the row (so-calledmoving line) of the object to be tracked in the tracking processing, thetracking unit 1312 of the AI processing unit 131 can stably perform thetracking processing of the object by capturing the change in theposition of the tracking frame for each object reflected in the data ofthe captured image by image analysis.

As described above, the monitoring camera 10 according to Embodiment 1includes the imaging unit 11 that images at least one object (forexample, a person) in the angle of view, and the processor 13 that isequipped with the artificial intelligence (AI) and that detects aplurality of characteristic parts (for example, the whole body, thescapula upper portion, and the face) of the object reflected in the dataof the captured image input from the imaging unit 11 based on theartificial intelligence. The processor 13 associates information forspecifying each of the plurality of detected parts (for example,coordinates indicating the position in the captured image of the part,or a cut-out image of the part) using the same object ID correspondingto the plurality of parts.

Accordingly, the monitoring camera 10 can accurately associate aplurality of characteristic parts related to the object reflected in thedata (video data) of the captured image within the angle of view set inthe monitoring area, and thus it is possible to support improvement ofthe search accuracy of one or more objects reflected in the video datawithin the angle of view.

In addition, the processor 13 executes the tracking processing of theobject corresponding to the object ID by using the associationprocessing result (for example, information for specifying the positionsof the whole body frame portion, the scapula upper frame portion, theface frame portion, and the like) of each of the plurality of partsassociated with the same object ID. Accordingly, the monitoring camera10 can accurately capture the row (so-called moving line) of the objectreflected in the data of the captured image input from the imaging unit11 one after another.

In addition, the processor 13 determines whether or not the part towhich the object ID is assigned is the best shot suitable for theidentification processing of the attribute information of the object.When it is determined that the part to which the object ID is attachedis the best shot, the processor 13 cuts out the part determined as thebest shot from the data of the captured image based on the object IDused for the association, and executes the identification processing ofthe attribute information on the cut-out part. Accordingly, themonitoring camera 10 can obtain the cut-out image data obtained bycutting out the part of the best shot with high quality suitable for theidentification processing from the data of the imaged image, and canaccurately extract the contents of the large number of attributeinformation from the same object by the identification processing ofeach cut-out image data.

In addition, the monitoring camera 10 further includes a transmissionunit 15 that transmits the identification result (identificationprocessing result) of the attribute information of each characteristicpart of the object and the information related to the best shot (bestshot information) to a server (for example, the client server 20)communicably connected to the monitoring camera 10. Accordingly, theclient server 20 can store the identification result of the attributeinformation for each object (for example, a person) obtained by themonitoring camera 10 and the information on the best shot used for theidentification processing in association with each other, and thus it ispossible to improve the accuracy of the search processing related to theobject.

In addition, the object is at least one person. The plurality of partsinclude a scapula upper portion of a person and a whole body of a personor a face of a person. Accordingly, the monitoring camera 10 cancomprehensively extract, for one or more persons appearing in themonitoring area, various attribute information which is characteristicinformation of a person from images of characteristic parts of eachperson.

In addition, the processor 13 identifies at least one of gender, age,hairstyle, hair color, beard, presence/absence of mask, andpresence/absence of glasses of a person based on the cut-out image ofthe scapula upper frame indicating the scapula upper portion. Theprocessor 13 identifies at least one of the clothing type, clothingcolor, bag, and muffler of the person based on the cut-out image of thewhole body frame indicating the whole body. Accordingly, the monitoringcamera 10 can extract at least one of the gender, age, hairstyle, haircolor, beard, presence/absence of mask, and presence/absence of glasseswith high accuracy based on the cut-out image of the scapula upper frameindicating the scapula upper portion of the person. In addition, themonitoring camera 10 can extract at least one of the clothing type,clothing color, bag, and muffler of the person with high accuracy basedon the cut-out image of the whole body frame indicating the whole bodyof the person.

In addition, the object is a plurality of persons and vehicles. Theplurality of parts include the whole body of the person and the entirevehicle. Accordingly, the monitoring camera 10 can associate a pair of aplurality of objects having different types (for example, a person and abicycle) reflected in the data of the captured image with the sameobject ID.

In addition, the monitoring camera 10 according to Embodiment 1 includesthe imaging unit 11 that captures an image of at least one object (forexample, a person) within the angle of view, and the processor 13 thatis equipped with the artificial intelligence (AI) and that detects thecharacteristic parts (for example, a whole body, a scapula upperportion, and a face) of the object reflected in the data of the capturedimage input from the imaging unit 11 based on the artificialintelligence. The processor 13 determines whether or not a detectionpart (for example, the whole body, the scapula upper shoulder, or aface), which is a part detected based on the artificial intelligence, isa priority part (for example, a scapula upper portion) suitable for thetracking processing of the object. When it is determined that thedetection part is the priority part (see (Method 1) described above),the processor 13 uses the priority part as the tracking frame to performthe tracking processing of the object.

Accordingly, the monitoring camera 10 can use, as a tracking frame, apriority part (for example, a scapula upper portion) suitable fortracking an object (for example, a person) reflected in the data (videodata) of the captured image within the angle of view set in themonitoring area, and thus it is possible to support improvement of thetracking accuracy of the object reflected in the video data within theangle of view. Therefore, for example, in a case where a person isviewed from the monitoring camera 10, even in a case where a part or thewhole of a shield (for example, a desk, a multifunction machine, or awall) in front of the person shields the person, by using the prioritypart (for example, a scapula upper portion) as a tracking frame, theclient server 20 can perform efficient search processing.

In addition, when it is determined that the detection part is not apriority part (see (Method 2) to (Method 4) described above), theprocessor 13 generates a tracking frame based on the detection part andexecutes tracking processing of the object. Accordingly, even in asituation in which it is difficult to detect the priority part dependingon the movement or the posture of the object, the monitoring camera 10can generate the tracking frame by estimating the position of thepriority part (for example, the scapula upper portion) from theinformation (for example, the coordinates indicating the position in thecaptured image of the detection part) for specifying the detection part(for example, the whole body or the face) which is not the prioritypart, and thus the monitoring camera 10 can execute the trackingprocessing of the object generally and accurately.

In addition, the object is at least one person. The priority part is thescapula upper portion of the person. Accordingly, the monitoring camera10 can perform tracking processing with high accuracy using information(for example, coordinates indicating the position in the captured imageof the scapula upper portion) for specifying the scapula upper frameportion indicating the scapula upper portion of each person for one ormore persons appearing in the monitoring area.

In addition, when it is determined that the detection part is only theface of the person, the processor 13 generates the tracking frame basedon the face frame information of the person (that is, information suchas coordinates in the captured image that specifies the face frameportion). Accordingly, even when only the face can be detected based onthe AI based on the movement or posture of the person, the monitoringcamera 10 can estimate the position of the priority part (for example,the scapula upper portion) with high accuracy by using the face frameinformation (see above), and thus it is possible to suppress theaccuracy deterioration of the tracking processing for the person.

In addition, when it is determined that the detection part is only thewhole body of the person, the processor 13 generates the tracking framebased on the whole body frame information of the person (that is,information such as coordinates in the captured image that specifies thewhole body frame portion). Accordingly, even when only the whole bodycan be detected based on the AI based on the movement or posture of theperson, the monitoring camera 10 can estimate the position of thepriority part (for example, the scapula upper portion) with highaccuracy by using the data of the whole body frame information (seeabove), and thus it is possible to suppress the accuracy deteriorationof the tracking processing for the person.

In addition, when it is determined that the detection part is only theface and the whole body of the person, the processor 13 generates atracking frame based on the face frame information of the person (seeabove) and the whole body frame information (see above). Accordingly,even when only the whole body and the face can be detected based on theAI based on the movement or posture of the person, the monitoring camera10 can estimate the position of the priority part (for example, thescapula upper portion) with high accuracy by using each data of thewhole body frame information (see above) and the face frame information(see above), and thus it is possible to suppress the accuracydeterioration of the tracking processing for the person.

Although various embodiments have been described with reference to theaccompanying drawings, the present disclosure is not limited to examplesin the embodiments. It will be apparent to those skilled in the art thatvarious changes, modifications, substitutions, additions, deletions, andequivalents can be conceived within the scope of the claims, and itshould be understood that these changes, modifications, substitutions,additions, deletions, and equivalents also belong to the technical scopeof the present invention. Components in various embodiments describedabove may be combined freely within a range without deviating from thespirit of the invention.

The present disclosure is useful as a monitoring camera, a partassociation method, and a program for supporting improvement of searchaccuracy of one or more objects reflected in video data within an angleof view.

What is claimed is:
 1. A monitoring camera, comprising: a capturing unitthat is configured to capture an image of at least one object in anangle of view, and a processor that is equipped with artificialintelligence and that is configured to detect a plurality ofcharacteristic parts of the object reflected in a captured image inputfrom the capturing unit based on the artificial intelligence, whereinthe processor associates, for each of the at least one object,information for specifying each of the plurality of detectedcharacteristic parts with a same object ID.
 2. The monitoring cameraaccording to claim 1, wherein the processor performs tracking processingof the object based on an association processing result of each of theplurality of characteristic parts associated with the same object ID. 3.The monitoring camera according to claim 1, wherein the processordetermines whether or not a part to which an object ID is assigned is abest shot suitable for identification processing of attributeinformation of an object corresponding to the object ID, and in a casewhere it is determined that the part to which the object ID is assignedis the best shot, cuts out the part determined to be the best shot fromthe captured image based on the object ID, and performs theidentification processing of the attribute information on the cut-outpart.
 4. The monitoring camera according to claim 3, further comprising:a transmission unit that is configured to transmit an identificationresult of the attribute information for each of the plurality ofcharacteristic parts and information related to the best shot to aserver communicably connected to the monitoring camera.
 5. Themonitoring camera according to claim 3, wherein the at least one objectcomprises at least one person, and the plurality of characteristic partsinclude a scapula upper portion of the person, the scapula upper portioncontaining a scapula and a portion above the scapula, and a whole bodyof the person or a face of the person.
 6. The monitoring cameraaccording to claim 5, wherein the processor identifies at least one ofgender, age, hairstyle, hair color, beard, presence or absence of mask,and presence or absence of glasses of the person based on a scapulaupper frame image indicating the scapula upper portion of the person,and identifies at least one of a clothing type, a clothing color, a bag,and a muffler of the person based on a whole body frame image indicatingthe whole body of the person.
 7. The monitoring camera according toclaim 1, wherein the at least one object comprises a plurality ofpersons and a vehicle, and the plurality of parts include a whole bodyof the person and an entire portion of the vehicle.
 8. A partassociation method performed by a monitoring camera equipped withartificial intelligence, the part association method comprising:capturing an image of at least one object in an angle of view, detectinga plurality of characteristic parts of the object reflected in an inputcaptured image based on the artificial intelligence, and associating,for each of the at least one object, information for specifying each ofthe plurality of detected characteristic parts with a same object ID. 9.A monitoring camera, comprising: a capturing unit that is configured tocapture an image of at least one object in an angle of view, and aprocessor that is equipped with artificial intelligence and that isconfigured to detect a characteristic part of the object reflected in acaptured image input from the capturing unit based on the artificialintelligence, wherein the processor determines whether or not adetection part detected based on the artificial intelligence is apriority part suitable for the tracking processing of the object, anduses the priority part as a tracking frame to perform the trackingprocessing of the object in a case where it is determined that thedetection part is the priority part.
 10. The monitoring camera accordingto claim 9, wherein the processor generates the tracking frame based onthe detection part in a case where it is determined that the detectionpart is not the priority part, and performs the tracking processing ofthe object.
 11. The monitoring camera according to claim 9, wherein theat lease one object comprises at least one person, and the priority partis a scapula upper portion of the person containing a scapula and aportion above the scapula.
 12. The monitoring camera according to claim11, wherein the processor generates the tracking frame based on a faceframe information of the person in a case where it is determined thatthe detection part is only a face of the person.
 13. The monitoringcamera according to claim 11, wherein the processor generates thetracking frame based on a whole body frame information of the person ina case where it is determined that the detection part is only a wholebody of the person.
 14. The monitoring camera according to claim 11,wherein the processor generates the tracking frame based on a face frameinformation and a whole body frame information of the person in a casewhere it is determined that the detection part is only a face and awhole body of the person.
 15. A tracking frame generation methodperformed by a monitoring camera equipped with artificial intelligence,the tracking frame generation method comprising: capturing an image ofat least one object in an angle of view, detecting a characteristic partof the object reflected in an input captured image based on theartificial intelligence, determines whether or not a detection partdetected based on the artificial intelligence is a priority partsuitable for the tracking processing of the object, and using thepriority part as a tracking frame to perform the tracking processing ofthe object in a case where it is determined that the detection part isthe priority part.